Determining pose for use with digital watermarking, fingerprinting and augmented reality

ABSTRACT

Image recognition and augmented reality experiences utilize auxiliary data extracted from an image or video, or image fingerprints, or a combination of both. One claim recites a method of controlling a device, the device comprising a camera and a display screen. The method comprises: receiving image data captured by the camera; modifying received image data to compensate for distortion caused by capture positioning of the camera relative to an imaged subject; analyzing modified imagery to detect an encoded signal therefrom; extracting a digital fingerprint from the modified imagery, the digital fingerprint corresponding to an image area hosting the encoded signal; determining a relative spatial position of the image area based on the digital fingerprint; and providing the relative spatial position of the image area to an augmented reality (AR) system, in which the AR system overlays graphics or video on the display screen corresponding to the image area. Of course other claims and combinations are provided as well.

APPLICATION FIELD

This application is a continuation of U.S. application Ser. No.13/789,126, filed Mar. 7, 2013 (now U.S. Pat. No. 9,684,941), whichclaims the benefit of U.S. Provisional Patent Application No.61/719,920, filed Oct. 29, 2012. This application is also related toU.S. Provisional Patent Application No. 61/749,767, filed Jan. 7, 2013.Each of the above patent documents is hereby incorporated herein byreference in its entirety.

TECHNICAL FIELD

This disclosure relates to digital signal processing, image rendering(including Raster Image Processing for print), image recognition, datasignal detection, and computer generated graphics in conjunction withlive image capture and recognition (e.g., Augmented Reality).

BACKGROUND AND SUMMARY

There are a variety of ways to encode machine readable information onobjects, and in particular, on printed objects. Conventional visibledata carriers for printed media include various forms of bar codes,including monochrome (e.g., black and white) 1D and 2D bar codes, aswell as newer higher density codes that use additional colors to carrydata. One example of higher density bar codes are data glyphs, which aremarks (e.g., forward and back slash marks) printed at higher resolution.When viewed from a distance, glyph codes can appear as a uniform tone,and as such, can be printed in the background around other visualinformation.

In these types of data carriers, the elementary units (bars or dataglyph marks) are independent of other visual information and conveyauxiliary data. A mark or arrangement of marks is a pattern thatcorresponds to an auxiliary data symbol. To read the data from a printedobject, the object is first optically scanned with an image sensor,converting light to an electronic signal. The electronic signal is thenanalyzed to detect the elements of the mark and convert them to data.

Digital watermarking is a machine readable code in which the data ishidden within an image leveraging human visibility models to minimizevisual impact on the image. For certain types of applications whereimage information is sparse, the auxiliary data signal can still beapplied to the printed object with minimal visual impact by insertingimperceptible structures having spatial frequency and color beyond therange of human visual perception. The auxiliary data signal can beconveyed by printing ink structures or modifying existing structureswith changes that are too small to see or use colors that are difficultto discern. As such, digital watermarking techniques provide theflexibility of hiding data within image content, as well as insertingdata in parts of a printed object where there is little or no othervisual information.

These types of visible and hidden data carriers are useful forapplications where there is a need to convey variable digital data inthe printed object. Hidden data carriers increase the capacity ofprinted media to convey visual and machine readable information in thesame area. Even printed objects, or portions of objects (such as logos,pictures or graphics on a document or package) that appear identical aretransformed into variable data carriers.

For some applications, it is possible to identify an image using aone-to-many pattern matching scheme. Images to be uniquely identifiedare enrolled in a reference database, along with metadata. In imagefingerprinting schemes, image features are stored in the referencedatabase. Then, to recognize an image, suspect images, or its features,are matched with corresponding images or features in the referencedatabase. Once matched, the reference database can provide associateddigital data stored with the image.

Data carrying signals and matching schemes may be used together toleverage the advantages of both. In particular, for applications wheremaintaining the aesthetic value or the information content of the imageis important, a combination of digital watermarking and imagefingerprinting can be used.

Combinations of watermarks and fingerprints for content identificationand related applications are described in assignees U.S. PatentPublications 20060031684 and 20100322469, which are each herebyincorporated by reference in its entirety. Watermarking, fingerprintingand content recognition technologies are also described in assignee'sU.S. Patent Publication 20060280246 and U.S. Pat. Nos. 6,122,403,7,289,643, 6,614,914, and 6,590,996 which are each hereby incorporatedby reference in its entirety.

In many applications, it is advantageous to insert auxiliary data inprinted object in a way that does not impact the other visualinformation on the object, yet still enables the data to be reliablyretrieved from an image captured of the object. To achieve this, atechnique to exploit the gap between the limit of human visualperception and the limit of an image sensor has been developed. Thegamut of human visual perception and the gamut of an image sensor aredefined in terms of characteristics of the rendered output, includingspatial resolution or spatial frequency and color. Each gamut is amulti-dimensional space expressed in terms of these characteristics. Thegap between the gamut of human and sensor perception is amultidimensional space that our data insertion schemes exploit to insertauxiliary data without impacting other visual information on the object.

This multi-dimensional gap is a 5-dimensional space (2 spatial+3 color)or higher (spatial/color shapes, frequencies, distributions) where ourmethods insert:

(1) uniform texture watermarks (independent of content—but controlledfor visibility), and

(2) content-based watermarks where the content is used as a referenceframework. As a reference, the content is either altered in ameasureable but imperceptible way or used (e.g., edges) to locate andorient an underlying variation that is intended to keep the contentunchanged.

Digital printing is becoming increasingly more advanced, enablinggreater flexibility and control over the image characteristics used fordata insertion when preparing an image for printing. The process ofpreparing a digital image for printing encompasses conversion of animage by a Raster Image Processor, Raster Image Processing, halftoning,and other pre-print image processing. Background on these processes isprovided below.

Along with advances in printing, the gamut of even widely used imagesensors is becoming greater. For hidden data insertion, the challenge isto insert the data in the human-sensor perception gap so that it can bewidely detected across many consumer devices. Of course, for certainsecurity applications, more expensive printers and image scanners can bedesigned to insert security features and expand the gamut of thescanning equipment used to detect such features. This is useful todetect security features and/or tampering with such features. However,the human-device perception gap is smaller for more widely deployedsensors, such as those commonly used in mobile devices like smart phonesand tablet PCs.

Our data insertion methods exploit the gap more effectively through datainsertion in the process of preparing a digital image for printing.Additional control over the process of inserting auxiliary data isachieved by implementing the process in the Raster Image Processor(RIP).

A raster image processor (RIP) is a component used in a printing systemwhich produces a raster image also known as a bitmap. The bitmap is thensent to a printing device for output. The input may be a pagedescription in a high-level page description language such asPostScript, Portable Document Format, XPS or another bitmap of higher orlower resolution than the output device. In the latter case, the RIPapplies either smoothing or interpolation algorithms to the input bitmapto generate the output bitmap.

Raster image processing is the process and the means of turning vectordigital information such as a PostScript file into a high-resolutionraster image. A RIP can be implemented either as a software component ofan operating system or as a firmware program executed on amicroprocessor inside a printer, though for high-end typesetting,standalone hardware RIPs are sometimes used. Ghostscript and GhostPCLare examples of software RIPs. Every PostScript printer contains a RIPin its firmware.

Half-toning is a process of converting an input image into halftonestructures used to apply ink to a medium. The digital representation ofa halftone image is sometimes referred to as a binary image or bitmap,as each elementary image unit or pixel in the image corresponds to thepresence, or not, of ink. Of course, there are more variables that canbe controlled at particular spatial location, such as various colorcomponents (CMYK and spot colors). Some advanced printers can controlother attributes of the ink placement, such as its density or spatialdepth or height.

This half-toning process is typically considered to be part of the RIPor Raster Image Processing. In some printing technologies, thesehalftone structures take the form of clustered dots (clustered dothalf-toning). In others, the halftone structures take the form ofnoise-like dot patterns (e.g., stochastic screens, blue noise masks,etc.).

Our patent literature provides several techniques for digitalwatermarking in the halftone process. Examples of these techniques aredetailed in U.S. Pat. Nos. 6,694,041 and 6,760,464, which are eachhereby incorporated herein by reference in its entirety.

New printing techniques enable very fine structures to be created in theRIP which will appear visually identical to the eye. For example a 50%gray can be created with a conventional clustered dot screen pattern at150 lines per inch, or exactly the same visual effect can be createdwith a much higher frequency line structure such as a stochastic screen.Usually, these two structures are not mixed on one page, as they havevery different dot gain characteristics and require differentcorrections. However, our methods are able to correct for the mechanicaldot gain, so that the two patterns appear identical when they appear onthe same page. See, in particular, our prior work in dot gaincorrection, printer calibration, and compensating for printer andscanner effects, in U.S. Pat. Nos. 6,700,995, 7,443,537, and U.S. PatentPublication 20010040979, which are each hereby incorporated herein byreference in its entirety.

Mobile devices have a capture resolution of much greater than 150 lpi(resolution of newer phones, such as iPhone 4 is about 600 lpi orbetter), so they can be used to distinguish between these two types ofpatterns. One particular example is an image that appears as a uniformtexture, yet a watermark pattern is inserted into it by modulating theline screen frequency and direction according to a watermark signalpattern. In particular, the locations of a watermark pattern are printedusing a higher frequency line pattern at first direction (e.g., verticalscreen angle). The other locations are printed with a lower frequencyline pattern in another direction (e.g., diagonal screen angle). Thewatermark signal is modulated into the image by selection of a higherfrequency screen at an arrangement of spatial locations that form thewatermark signal pattern. When printed, these locations look similar tosurrounding locations. However, when scanned, the sensor sees theselocations as being different, and the watermark pattern in the resultingelectronic image is easier to detect.

This approach allows a whole set of new messaging techniques to be usedin the range between 150 lpi and 600 lpi where 2 spatial dimensions and3 dimensions of color information can be inserted. This information canbe watermark, barcode or any other signaling mechanism.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the creation of a contentrecognition system using fingerprints and watermarks.

FIG. 2 is a block diagram illustrating the content identificationprocess.

FIG. 3 is a diagram of a cell phone, which may be used in some contentrecognition systems.

FIG. 4 is a diagram showing image capture of a subject surface.

FIG. 5 is a block diagram of resolving pose information from capturedimagery.

FIG. 6 is a timeline associated with resolving pose information to aiddigital watermark detection.

FIG. 7 is a diagram of an Augmented Reality system providing a videooverlay in a device display that corresponds to a watermarked area on asubject surface.

FIG. 8 shows a subject area including a watermarked area havingdifferent watermarked areas.

DETAILED DESCRIPTION

The process of digital watermark insertion includes generating awatermark signal, and then using that signal to modulate characteristicsof the image in the human-sensor gap. As described above, this processis preferably conducted at the RIP stage to enable control over theimage representation used to control application of ink to a printmedia.

In prior work, several methods for generating the watermark signal, andfor detecting the watermark signal in images captured of printedobjects, are detailed. Please see U.S. Pat. Nos. 6,614,914 and6,590,996, which are incorporated by reference. Therefore, for thisdiscussion, the focus is on techniques used within the RIP to insert thewatermark signal.

In one implementation, the watermark signal is generated as an array ofwatermark signal elements. These elements are mapped to spatiallocations within an image block, called a tile. This tile is thenreplicated (e.g., tiled in a regular, contiguous array of blocks in twodimensions) across the area of the host image in which the watermarksignal is to be inserted. At a spatial location where there is imagecontent in the host image, the watermark signal element is used tomodify the host image content at that location to carry the watermarksignal element, subject to constraints set for perceptual masking. Theseconstraints enable the watermark signal to be increased or decreased(possibly to zero), depending on perceptual masking, and desiredwatermark signal strength. Conversely, where there is no image content,the watermark signal element is either not applied, or it can beasserted as a texture, using colors and spatial resolution that make itdifficult to discern. As such, for every location in the watermarksignal mapping, there is an opportunity for watermark modulation.

As noted in more examples below, the watermark signal need not be mappedto a uniform array of blocks. One alternative is use feature points inthe image to form a spatial reference for insertion of a data signal.

The watermark signal can be comprised of a single data component, ormore than one component, as detailed in U.S. Pat. No. 6,614,914. Onecomponent is a direct sequence spread spectrum modulated data signal.This component is generated by applying error correction coding(convolutional coding) to a data signal, which produces an errorcorrection coded data signal. This signal is than modulated ontopseudorandom carrier signals to produce a spread spectrum modulatedsignal, which is then mapped to locations in a tile. This is one exampleof watermark signal generation, and there are many others.

Above, a method of inserting a watermark signal by varying the printstructure within the RIP to modulate the watermark into an image isillustrated. A specific example is given for varying the density anddirection or angle of print primitives (e.g., line structures or dots)used to print a particular color in an image having a uniform tone. Adata signal pattern may also be introduced by varying the halftonescreen type for different regions of an image. Print structures can varyamong a set of screening types, including noise like (e.g., stochasticscreens) and structured (clustered dot, line screens, etc.). Thisapproach is not limited to watermark modulation of images with uniformtones, as it applies to inserting watermarks into various types of imagecontent.

Some embodiment examples—include the following:a) Choose the angle in colorspace of watermark signal modulation (e.g.,in the ab plane of Lab colorspace) to be different at different regionsthroughout the image. In one class of digital watermark embodiments,these regions correspond to watermark signal elements, and anarrangement of the regions forms a spatial watermark signal pattern ofwatermark signal elements. Data may be modulated into the pattern usingspread spectrum modulation as noted above, or other data modulationschemes. The arrangement, orientation and shape of these regions may bedesigned to convey alternative data code signals. Multiple data signalsmay be interleaved for different spatial locations, as well as differentdirections in color space.b) Choose the spatial frequency of watermark signal modulation to bedifferent at different regions throughout the image. Similar datainsertion as mentioned for section a) also applies to this section b).c) Use the edges of the image content to define a signal along the edgesin an imperceptible set of dimensions (color & spatial frequency). Inthis case, the edges detected in the image are used as a reference forthe watermark signal. Thus, rather than being arranged in apre-determined array of blocks or regions, the watermark signal isinserted along the direction of the edge. Along this edge, the watermarksignal can have a regular pattern or structure to facilitate detection.The watermark signal is detected by first finding the edges and thendetecting the watermark signal relative to these edges (e.g., bycorrelating the image signal with the regular pattern of the datasignal).d) Use the edges of the content to define a signal perpendicular to theedges in an imperceptible set of dimensions (color & spatial frequency).As in the previous example, the edges provide a reference orientationand location of the watermark signal.e) Use higher dimensional shapes/patterns of color and spatialvariations where pixels separated spatially may still be close in eitherspatial or color patterns. This reduces sensitivity to geometricdistortions.

In some embodiments, those higher frequency spatial/color variations aredesigned to take advantage of lower resolution devices to generateshifts in image characteristics that can be measured. The data signalelements are inserted to exploit the Bayer pattern of RGB sensors toenhance a desired data signal that would otherwise be imperceptible.These signal elements are designed to induce distortion (e.g., aliasing,or a color shift) in the image captured through the sensor of theprinted object. This distortion at the data signal locations enhancesthe pattern because the shift in signal characteristics at theselocations increases the data signal at these locations relative tosurrounding image content and noise. For example, aliasing caused bycapturing a high frequency screen region with a lower frequency sensorcreates a detectable data signal element at that region.

A similar effect can also be achieved by modulating ink height using aprinter that is capable of controlling the height of ink deposited at aparticular location. These printers enable control over height of ink bybuilding up ink at a particular print location. This is useful forauthentication or copy protection applications.

The height of the structure can be used to carry information by viewingat an angle with a device such as a fixed focus (or Lytro) camera.

The height variations can also be designed to cause color changes thatare used to carry information. When the print is viewed normally, theseheight variations would be imperceptible if the pigment is opaque. Thisinformation can be watermark, barcode or any other signaling mechanism.

The above methods apply to variety of print primitives, and are notlimited to particular line screens or clustered dot structures. Withcontrol over the RIP, the shape, spatial frequency, and orientation ofstructures can be specifically designed to exploit sensor geometries andModulation Transfer Function (MTF) characteristics to causediscrimination between local regions of an image. For example, smalllines slanted left and right at different spatial frequencies. Or soliddots vs. tiny dot clusters which contain the same ink density onphysical object, but differ in color after acquisition through a classof image sensors (such as those sensors widely used in smartphonecameras). Some regions may use a form of noise like dot pattern (e.g.,stochastic screening), while others use a shape with particularstructure, like a clustered dot or line screen. The dot gain varies withnumber of edges (perimeter) of the print structures, so the amount ofdot gain correction is also adapted based on the print structure. Forexample, in the example above where some regions are printed with highfrequency line structures and others with lower frequency, the linewidths in the high frequency structure have to be reduced more than theline widths in the lower frequency structure to compensate for dot gain.

Another approach that can be implemented within the RIP is to transformthe image into a form for printing so that it has carefully controllednoise characteristics. The noise characteristics can be set globallyacross an image to indicate the presence of a watermark. The noiseitself can comprise digital data, such as a spread spectrum modulateddata signal. Alternatively, the RIP can generate an image with a patternof regions that are detectable based on distinguishable noisecharacteristics. The arrangement of this pattern can be used as areference signal to provide the location and orientation of a watermarksignal inserted in the image.

The watermark may also be conveyed using a reversible image transform ordetailed image characterization by manipulating the image through eithertransform coefficients or through local noise manipulations in adetectable yet imperceptible way. One form of reversible transform isthe grayscale medial axis transform applied separately to the colordirections. See, in particular, Image approximation from gray scale‘medial axes’ by Wang, S.; Wu, A. Y.; Rosenfeld, A., in IEEETransactions on Pattern Analysis and Machine Intelligence, vol. PAMI-3,November 1981, p. 687-696.

A stochastic modeling approach that allows for detectable manipulationsis the Markov Random Field (MRF) model that can be used to define localpixel relationships that convey watermark signal data elements. The MRFmanipulation is particularly interesting because it can be designed tohave particular noise properties that might be exploited at thedetector. See, How to generate realistic images using gated MRF'sMarc'Aurelio Ranzato Volodymyr Mnih Geoffrey E. Hinton, Department ofComputer Science, University of Toronto

SIFT Description

SIFT is an acronym for Scale-Invariant Feature Transform, a computervision technology developed by David Lowe and described in various ofhis papers including “Distinctive Image Features from Scale-InvariantKeypoints,” International Journal of Computer Vision, 60, 2 (2004), pp.91-110; and “Object Recognition from Local Scale-Invariant Features,”International Conference on Computer Vision, Corfu, Greece (September1999), pp. 1150-1157, as well as in U.S. Pat. No. 6,711,293.

SIFT works by identification and description—and subsequent detection—oflocal image features. The SIFT features are local and based on theappearance of the object at particular interest points, and areinvariant to image scale, rotation and affine transformation. They arealso robust to changes in illumination, noise, and some changes inviewpoint. In addition to these properties, they are distinctive,relatively easy to extract, allow for correct object identification withlow probability of mismatch and are straightforward to match against a(large) database of local features. Object description by a set of SIFTfeatures is also robust to partial occlusion; as few as 3 SIFT featuresfrom an object can be enough to compute location and pose.

The technique starts by identifying local image features—termedkeypoints—in a reference image. This is done by convolving the imagewith Gaussian blur filters at different scales (resolutions), anddetermining differences between successive Gaussian-blurred images.Keypoints are those image features having maxima or minima of thedifference of Gaussians occurring at multiple scales. (Each pixel in adifference-of-Gaussian frame is compared to its eight neighbors at thesame scale, and corresponding pixels in each of the neighboring scales(e.g., nine other scales). If the pixel value is a maximum or minimumfrom all these pixels, it is selected as a candidate keypoint.

(It will be recognized that the just-described procedure is ablob-detection method that detects space-scale extrema of ascale-localized Laplacian transform of the image. The difference ofGaussians approach is an approximation of such Laplacian operation,expressed in a pyramid setting.)

The above procedure typically identifies many keypoints that areunsuitable, e.g., due to having low contrast (thus being susceptible tonoise), or due to having poorly determined locations along an edge (theDifference of Gaussians function has a strong response along edges,yielding many candidate keypoints, but many of these are not robust tonoise). These unreliable keypoints are screened out by performing adetailed fit on the candidate keypoints to nearby data for accuratelocation, scale, and ratio of principal curvatures. This rejectskeypoints that have low contrast, or are poorly located along an edge.

More particularly this process starts by—for each candidatekeypoint—interpolating nearby data to more accurately determine keypointlocation. This is often done by a Taylor expansion with the keypoint asthe origin, to determine a refined estimate of maxima/minima location.

The value of the second-order Taylor expansion can also be used toidentify low contrast keypoints. If the contrast is less than athreshold (e.g., 0.03), the keypoint is discarded.

To eliminate keypoints having strong edge responses but that are poorlylocalized, a variant of a corner detection procedure is applied.Briefly, this involves computing the principal curvature across theedge, and comparing to the principal curvature along the edge. This isdone by solving for eigenvalues of a second order Hessian matrix.

Once unsuitable keypoints are discarded, those that remain are assessedfor orientation, by a local image gradient function. Magnitude anddirection of the gradient are calculated for every pixel in aneighboring region around a keypoint in the Gaussian blurred image (atthat keypoint's scale). An orientation histogram with 36 bins is thencompiled—with each bin encompassing ten degrees of orientation. Eachpixel in the neighborhood contributes to the histogram, with thecontribution weighted by its gradient's magnitude and by a Gaussian withσ 1.5 times the scale of the keypoint. The peaks in this histogramdefine the keypoint's dominant orientation. This orientation data allowsSIFT to achieve rotation robustness, since the keypoint descriptor canbe represented relative to this orientation.

From the foregoing, plural keypoints are different scales areidentified—each with corresponding orientations. This data is invariantto image translation, scale and rotation. 128 element descriptors arethen generated for each keypoint, allowing robustness to illuminationand 3D viewpoint.

This operation is similar to the orientation assessment procedurejust-reviewed. The keypoint descriptor is computed as a set oforientation histograms on (4×4) pixel neighborhoods. The orientationhistograms are relative to the keypoint orientation and the orientationdata comes from the Gaussian image closest in scale to the keypoint'sscale. As before, the contribution of each pixel is weighted by thegradient magnitude, and by a Gaussian with σ 1.5 times the scale of thekeypoint. Histograms contain 8 bins each, and each descriptor contains a4×4 array of 16 histograms around the keypoint. This leads to a SIFTfeature vector with (4×4×8=128 elements). This vector is normalized toenhance invariance to changes in illumination.

The foregoing procedure is applied to training images to compile areference database. An unknown image is then processed as above togenerate keypoint data, and the closest-matching image in the databaseis identified by a Euclidian distance-like measure. (A “best-bin-first”algorithm is typically used instead of a pure Euclidean distancecalculation, to achieve several orders of magnitude speed improvement.)To avoid false positives, a “no match” output is produced if thedistance score for the best match is close—e.g., 25%—to the distancescore for the next-best match.

To further improve performance, an image may be matched by clustering.This identifies features that belong to the same referenceimage—allowing unclustered results to be discarded as spurious. A Houghtransform can be used—identifying clusters of features that vote for thesame object pose.

An article detailing a particular hardware embodiment for performing theSIFT procedure, suitable for implementation in a next generation cellphone, is Bonato et al, “Parallel Hardware Architecture for Scale andRotation Invariant Feature Detection,” IEEE Trans on Circuits andSystems for Video Tech, Vol. 18, No. 12, 2008.

An alternative hardware architecture for executing SIFT techniques isdetailed in Se et al, “Vision Based Modeling and Localization forPlanetary Exploration Rovers,” Proc. of Int. Astronautical Congress(IAC), October, 2004.

While SIFT is perhaps the most well-known technique for generatingrobust local descriptors, there are others, which may be more or lesssuitable—depending on the application. These include GLOH (c.f.,Mikolajczyk et al, “Performance Evaluation of Local Descriptors,” IEEETrans. Pattern Anal. Mach. Intell., Vol. 27, No. 10, pp. 1615-1630,2005) and SURF (c.f., Bay et al, SURF: Speeded Up Robust Features,” Eur.Conf. on Computer Vision (1), pp. 404-417, 2006; Chen et al, “EfficientExtraction of Robust Image Features on Mobile Devices,” Proc. of the 6thIEEE and ACM Int. Symp. On Mixed and Augmented Reality, 2007; and Takacset al, “Outdoors Augmented Reality on Mobile Phone Using Loxel-BasedVisual Feature Organization,” ACM Int. Conf. on Multimedia InformationRetrieval, October 2008).

Watermarking and Fingerprinting System Configurations

FIG. 1 is a block diagram illustrating the creation of a contentrecognition system using fingerprints and watermarks. The digitizedinput image/video/audio signals 100 are input to the fingerprintcalculator/watermark embedder 102, which computes multiple fingerprintsfor each content item to be uniquely recognized, and also watermarks thecontent item. In a database entry process 102, the fingerprints areentered and stored in a database, along with additional information,such as metadata for the content item, a digital master copy for use asneeded (see Patent Application Publication 20100322469 for descriptionof techniques involving use of original content in watermark detectionand determining location within content). A database organizationprocess 106 in a database system sorts and arranges the fingerprints ina data structure, such as a tree structure to enable fast searching andmatching. This database itself may be distributed over an array ofcomputers in an identification network (108). This network receivesqueries to identify or recognize content items based on a stream offingerprints and/or watermarks from a requesting device, such as auser's handheld mobile device or other computing device (node in anetwork of monitoring devices).

FIG. 2 is a block diagram illustrating the content identificationprocess. Incoming signals 109 are captured in a receiver 110. Thisincludes still or video image capture in which images are captured anddigitized with an image sensor like a camera or other image capturedevice, as well as ambient audio capture by microphone. It also includesreceipt of audio, image or video content in a broadcast or transmissionchannel, including broadcast stream or file transfer. The recognitionprocess may be invoked as part of a systematic Internet monitoring orbroadcast monitoring of content signals, in home audience measurement,batch database searching and content indexing, or user requests forcontent recognition and metadata searching. The fingerprintcalculator/watermark extracter 112 computes fingerprints and/orwatermarks for incoming content items and issues them to a database fordatabase search for matching fingerprints and data look up for watermarkbased identifiers 114. The fingerprint matches found in the searchprocess and watermark identifiers provide content identification (anumber or some other form of index for metadata lookup), which in turn,enables look up of metadata corresponding to the content identificationin one or more metadata databases. The metadata is then returned todevice 116 for display/output or further processing. This may involvereturning metadata to a device that requested the database search orsome other device to which the search results are directed (e.g., auser's home device, or a monitoring system's data collection database inwhich the metadata and recognition events are aggregated and compiledfor electronic report generation).

AR Exploitation

Sometimes watermark detection needs properly aligned image data toestablish a proper registration for reliable payload recovery. Suitableimage alignment is difficult to achieve in many mobile environments. Forexample, and with reference to FIG. 4, a smartphone captures imagery ofa subject surface (e.g., a magazine, newspaper, object, etc.). The poserelative to the smartphone's video camera and the subject surface(sometimes referred to as “image pose”) changes as a user positions thephone to capture video. In this context, pose can include perspectiveangle, scale, rotation and translation.

I have developed methods and systems to accurately estimate geometrycapture distortion and modify imagery prior to watermark detection. Thiscan be used in connection with augmented reality overlays to providerich user experiences. But it all starts with determining the correctrelative pose.

As an initial overview, and with reference to FIG. 5, captured imageframes are analyzed to identify key points. These key points can betracked over time to resolve relative image geometry including pose. Thecaptured imagery can be modified according to the resolved geometry toremove any distortion introduced by relative camera positioningincluding, e.g., removing rotation, perspective angle, scale, etc. Thewatermark detector can analyze the modified, captured imagery in searchof a previously hidden digital watermark.

Our methods can be implemented by many suitable electronic devices. Oneexample is a portable device including a video camera, e.g., such as asmartphone, tablet, pad, etc. With reference to FIG. 6, software (e.g.,a smartphone App) is enabled on the portable device. (One example of thesoftware may include a modified version of Digimarc's Digimarc Discoverapplication. From Digimarc's website: “Digimarc Discover uses multiplecontent identification technologies—digital watermarking, audiofingerprinting and QR code and barcode detection—to give smartphones theability to see, hear and engage with all forms of media. Consumerssimply launch the Digimarc Discover app and point their phone at thecontent of interest—an ad, article, package, retail sign, etc.—and areinstantly connected to a menu of optional experiences such as learnmore, view a video, launch an app, map directions, share via socialmedia, save for later or make a purchase.”)

Image data, e.g., video frames captured by the device's video camera isgathered and provided to a pose detector or detection process todetermine pose of the camera relative to a depicted subject surface.Captured imagery can be modified to remove any distortion, e.g., scale,perspective, translation, rotation. The modified imagery is analyzed forhidden digital watermarking. Once detected, the digital watermarking canserve as a backbone for an augmented reality (AR) experience. Forexample, the watermarking may include a link to obtain video. The videocan be overlaid in a device display area. In some cases, the video canbe overlaid in image display area spatially corresponding to the subjectsurface's that includes digital watermarking (FIG. 7). Updated poseinformation can be provided to ensure that the overlaid graphics orvideo continue to be positioned where intended, e.g., the video cancontinue to be played in the intended spatial area, even as the cameramoves relative to the object's surface.

Positioning and tracking of overlay graphics and video can be enhancede.g., by tracking and mapping image frames or features with the imageframes. For example, a keyframe-based SLAM system as discussed in Klein,et al., “Parallel Tracking and Mapping on a camera phone,” Mixed andAugmented Reality, ISMAR 2009, 8th IEEE International Symposium on 19-22Oct. 2009, which is hereby incorporated by reference in its entirety,could be used. Other tracking such as natural feature tracking ormarker-based systems, etc. could be used as well for the position andtracking of overlay graphics, video and other AR features.

But let's go back and even further discuss pose detection.

Imagery (video) frames are captured with a device sensor, e.g., acamera. A first image frame I₁ is analyzed to detect “key points”. A keypoint generally represents a robust image characteristic. Some examplesof key points include, e.g., a feature corner or other characteristic,an area having one or more (locally) large non-zero derivatives, etc.Other features as discussed above under the SIFT section can be used aswell. Homography matrices can be constructed representing key pointsfrom I₁ relative to another image frame I₂. (Of course, it is notnecessary for frames I₁ and I₂ to be adjacently located frames. In fact,there is some benefit for frames to have some sufficient distancebetween them to have a representable difference in rotation, scale,translation, perspective, etc. Additionally, homography can be estimatedfrom an image pair itself (e.g., two images), instead of from two (2) ormore sets of corresponding key-points.) For example, the EMS homographydescribed in Benhimane et al, “Homography-based 2d visual tracking andservoing,” The International Journal of Robotics Research, Vol. 26, No.7, pages 661-676, July 2007, could be used to represent a transformbetween key points in different image frames. The Benhimane paper ishereby incorporated herein by reference in its entirety. In noisyimagery, we've found that 20-60 key points are sufficient. Of course,more or less key points could be used with varying degrees of success.

Multiple pose Homographies can be constructed, e.g., between I₁ and I₂,I₂ and I₃, I₃ and I₄, and so on. Given at least four (4) views (e.g.,frames) of the subject surface, and corresponding pose Homographiesbetween the frames, a cost function can be utilized to find poseinformation that best fits a current frame. I prefer to use between 4-10homographies with a cost function; however, additional homographies maybe used as well. The techniques (including the cost function in section2.2.1) described in Pirchheim, et al., “Homography-Based Planar Mappingand Tracking for Mobile Phones,” could be used to find such poseinformation. The Pirchheim paper is hereby incorporated herein byreference in its entirety. The Homography that minimizes the costfunction can be used to provide pose information.

Pirchheim's Section 2.2.1 states:

“2.2.1 Cost Function and Parameterization

In the following we describe the mathematical formulation of theoptimization scheme given in [A. Ruiz, P. E. L. de Teruel, and L.Fernandez. Practical planar metric rectification. In Proc. BMVC 2006,2006] for completeness. We define the scene plane to be located in thecanonical position z=0 corresponding to the (x;y) plane. Thus, points onthe plane have a z-coordinate equal zero and can be written as (x;y;0;1)in homogeneous coordinates.

The unknowns in the optimization are the camera poses Pi relative tothis plane. Under the assumption that all world points are located onthe plane, camera poses can easily be re-formulated as 2D homographiesby eliminating the third column of the pose matrix Pi:

$\begin{matrix}{{{\begin{pmatrix}u \\v \\1\end{pmatrix}\text{∼}\left( {Rt} \right)\begin{pmatrix}x \\y \\0 \\1\end{pmatrix}} = {\left( {r_{1}{r_{2}}t} \right)\begin{pmatrix}x \\y \\1\end{pmatrix}}},} & (1)\end{matrix}$

The resulting pose homographies have the following important propertybased on the observation that their first and second columns areortho-normal vectors, where r₁ and r₂ are the first and second column ofR respectively:

$\begin{matrix}{{C^{T} \cdot C} = {{\begin{pmatrix}r_{1}^{T} \\r_{2}^{T} \\t^{T}\end{pmatrix}\; \left( {r_{1}{r_{2}}t} \right)} = \begin{pmatrix}1 & 0 & \cdot \\0 & 1 & \cdot \\ \cdot & \cdot & \cdot \end{pmatrix}}} & (2)\end{matrix}$

Additionally, given a pose homography C₁ and the homography H_(2;1)mapping from camera C₁ to C₂, the corresponding pose homography C₂ canbe computed as follows:

C₂=H_(2,1)·C₁.  (3)

C₁ must observe the constraint (2). Moreover, by substituting (3) into(2) we obtain the following additional constraint for C₁:

$\begin{matrix}{{C_{2}^{T} \cdot C_{2}} = {{\left( {C_{1}^{T}H_{21}^{T}} \right) \cdot \left( {H_{21}C_{1}} \right)} = {\begin{pmatrix}1 & 0 & \cdot \\0 & 1 & \cdot \\ \cdot & \cdot & \cdot \end{pmatrix}.}}} & (4)\end{matrix}$

We can formulate the constraint as a cost function on C₁ by enforcingthat the off-diagonal entries are 0 and the diagonal entries have thesame value. Thus, we define the following cost function for onehomography H_(i;1):

$\begin{matrix}{{{\left( {H_{i,1}C_{1}} \right)^{T}\left( {H_{i,1}C_{1}} \right)} = \begin{pmatrix}a_{1,1} & a_{1,2} & \cdot \\a_{1,2} & a_{2,2} & \cdot \\ \cdot & \cdot & \cdot \end{pmatrix}},} & {{~~~~~~~~~~~~~~~~~~~~~~~}(5)} \\{{e_{i}\left( C_{1} \right)} = {\left( {a_{1,2}/a_{1,1}} \right)^{2} + {\left( {{a_{2,2}/a_{1,1}} - 1} \right)^{2}.}}} & {(6)}\end{matrix}$

The resulting cost function (6) exploits well-known orthogonalityconstraints over the image of the absolute conic [R. I. Hartley and A.Zisserman. Multiple View Geometry in Computer Vision. CambridgeUniversity Press, second edition, 2004] and holds for any homographyH_(i;1) mapping from the reference camera to another camera i. For a setof cameras C_(i), all connected with individual homographies Ho to areference camera C₁, we construct a cost function by adding upindividual costs, obtaining a single cost function for the unknownreference camera pose C₁

$\begin{matrix}{{e\left( C_{1} \right)} = {\sum\limits_{i}{{e_{i}\left( C_{1} \right)}.}}} & (7)\end{matrix}$

Overall, the whole problem of estimating all camera poses Ci can bereduced to finding one camera pose C1 that minimizes the total costfunction (7). A homography H_(2;1) between two cameras has 8 degrees offreedom because it is defined up to scale. By fixing the unknown planeand allowing the second camera C₂ to move freely, the first camera C1has only 2 degrees of freedom left. Ruiz et al. [ . . . ] propose to fixthe camera position and vary the camera tilt (x-axis) and roll (z-axis)angles but remain vague concerning the valid 2DOF parameter range.Geometrically, we interpret the parameterization as depicted in FIG. 4.Plane and reference camera are defined to be located in canonicalposition, the plane aligning with the world (x;y) plane and thereference camera located at position (0;0;−1) such that world and cameracoordinate systems align. We assume that the plane rotates and thecamera stays fixed. The first rotation around the x-axis lets the planemove along a circle aligned with the (y,z) camera plane. The secondrotation lets the plane move along another circle aligned with the (x;y)camera plane. Avoiding the plane to be rotated behind the camera, wedefine (−π/2;π/2) as range for the x-rotation parameter. For thez-rotation parameter we define [−π/2, π/2) as the valid range to avoidsolution symmetry.”

The above mentioned papers: i) A. Ruiz, P. E. L. de Teruel, and L.Fernandez. Practical planar metric rectification, In Proc. BMVC 2006,2006, and ii) R. I. Hartley and A. Zisserman, Multiple View Geometry inComputer Vision, Cambridge University Press, second edition, 2004, areeach hereby incorporated herein by reference in their entireties.

There are many refinements. For example, different homographies can becreated for different pose parameters, e.g., separate out imagetranslation or group together scale and rotation, etc. Also, a firstpose estimate can be provided based on one or more pose parameters, andthen refine the estimate by using additional parameters.

Captured image data can be modified to remove or modify distortion basedon the pose information. Watermark detection can be carried out on themodified imagery.

The pose information need not be perfect, but provides pose informationthat preferably gets the pose detection in the ball park for watermarkdetection. For example, the digital watermarking detection currentlyused in the Digimarc Discover application currently can produce readswith a perspective angle of up to ±30-35%.

Successful watermark detection can launch an AR experience as discussedabove. A watermark payload bit (or bits) can also be used to trigger anannouncement to a user that an AR overlay is about to launch and/oroffer, e.g., the user a chance to cancel or proceed with the ARexperience.

The pose detector can continue to detect pose information (e.g., basedon minimizing a cost function associated with pose Homographies) fromcaptured imagery long after a watermark has been detected. This mayprovide the AR system with continuing pose information as the ARexperience continues. This continuing pose information can be providedto the AR system to help determine relative positioning of any overlaygraphics relative to captured imagery.

A potentially more accurate approach is to provide base-line orientationinformation from digital watermark detection. For example, successfulwatermark detection may also provide image orientation information.Indeed, digital watermarking many include orientation attributes (see,e.g., U.S. Pat. Nos. 8,243,980; 7,116,781 and 6,614,914; which are eachhereby incorporated herein by reference in its entirety) that arehelpful to identify the original rotation, scale and translation of theimagery when the watermark was inserted. This base-line orientationinformation can be used by an AR system, e.g., for transforming capturedimagery for display on a device screen to accommodate for relativecapture device pose. (Watermark orientation information can also be usedto update or reset pose information being calculated by a PoseDetector.)

Watermark information can be used to modify or remove unwanted rotation,scaling or translation, essentially restoring the image to the state inwhich it was watermarked. This restored image content allows forreliable digital fingerprint analysis. Consider the possibilities.

Having access to the original image when embedding watermarking, awatermark embedder can analyze image areas and based, e.g., on color,luminance, texture and/or coefficient information, can calculate afingerprint of the area. For example, and with reference to FIG. 8,areas 1-6 are separately fingerprinted. This information can be storedin association with a digital watermark that is embedded in the areas.

A watermark detector later encounters imagery depicting areas 1-6. Ifthe watermark is redundantly encoded in areas 1-6 (e.g., the samewatermark is placed in each area), the detector might have troubledetermining whether it detected the watermark from area 3 vs. area 1 vs.area 4, and so on. This may matter if a different AR experience isintended for different areas on the subject's surface.

Since the imagery is restored to its original or near original form, thewatermark detector, or a unit cooperating with the watermark detector,may compute a corresponding digital fingerprint of the detection area.This can be compared to the original fingerprint (created at embedding)to determine the location of the watermark detection area, e.g., doesthe fingerprint correspond to areas 1 or 3 or 4. In one example thefingerprint calculation process uses coefficients of a linearprojection. When a watermark is read, the watermark detector (orsoftware/device) cooperating with the detector, may communicate thewatermark payload to a registry. This registry may include the originalfingerprint information that the detector can use to determine thedigital watermark read location. Knowing the location of a detectionblock can be important in some applications where the spatial positionof the watermark on a surface is used by an AR system (e.g., overlayingvideo only over certain areas of a photograph that contains multiplewatermark areas or blocks).

The area or block position alternatively can be included in a watermarkpayload. For example, an ID or other indicator may indicate thelocation, or relative location of the watermarked area.

System and Components

It is envisioned that the above processes, systems and system componentscan be implemented in a variety of computing environments and devices.It is specifically contemplated that the processes and components willbe implemented within devices and across multiple devices. For example,signal capture, signature calculation and database entry andorganization are performed on a set of devices to construct arecognition system, and signal capture, signature calculation anddatabase search and retrieval are performed on another set of devices,which may be distinct or overlap.

The computing environments used to implement the processes and systemcomponents encompass a broad range from general purpose, programmablecomputing devices to specialized circuitry, and devices including acombination of both. The processes and system components may beimplemented as instructions for computing devices, including generalpurpose processor instructions for a variety of programmable processors,including microprocessors, Digital Signal Processors, electronicprocessors, etc. These instructions may be implemented as software,firmware, etc. These instructions can also be converted to various formsof processor circuitry, including programmable logic devices,application specific circuits, including digital, analog and mixedanalog/digital circuitry. Execution of the instructions can bedistributed among processors and/or made parallel across processorswithin a device or across a network of devices. Transformation ofcontent signal data may also be distributed among different processorand memory devices.

The computing devices include, e.g., one or more processors, one or morememories (including computer readable media), input devices, outputdevices, and communication among these components (in some casesreferred to as a bus). For software/firmware, instructions are read fromcomputer readable media, such as optical, electronic or magnetic storagemedia via a communication bus, interface circuit or network and executedon one or more processors.

The above processing of content signals includes transforming of thesesignals in various physical forms. Images and video (forms ofelectromagnetic waves traveling through physical space and depictingphysical objects) may be captured from physical objects using cameras orother capture equipment, or generated by a computing device. Similarly,audio pressure waves traveling through a physical medium may be capturedusing an audio transducer (e.g., microphone) and converted to anelectronic signal (digital or analog form). While these signals aretypically processed in electronic and digital form to implement thecomponents and processes described above, they may also be captured,processed, transferred and stored in other physical forms, includingelectronic, optical, magnetic and electromagnetic wave forms. Thecontent signals are transformed during processing to compute signatures,including various data structure representations of the signatures asexplained above. In turn, the data structure signals in memory aretransformed for manipulation during searching, sorting, reading, writingand retrieval. The signals are also transformed for capture, transfer,storage, and output via display or audio transducer (e.g., speakers).

While reference has been made to mobile devices (like cell phones) andembedded systems, it will be recognized that this technology findsutility with all manner of devices—both portable and fixed. PDAs,organizers, portable music players, desktop computers, wearablecomputers, servers, etc., can all make use of the principles detailedherein. Particularly contemplated cell phones include the Apple iPhone,and cell phones following Google's Android specification (e.g., the G1phone, manufactured for T-Mobile by HTC Corp.). The term “cell phone”should be construed to encompass all such devices, even those that arenot strictly-speaking cellular, nor telephones. (Details of an iPhone,including its touch interface, are provided in published patentapplication 20080174570, which is hereby incorporated herein byreference.)

The design of cell phones and other computers that can be employed topractice the methods of the present disclosure are familiar to theartisan. In general terms, each includes one or more processors, one ormore memories (e.g. RAM), storage (e.g., a disk or flash memory), a userinterface (which may include, e.g., a keypad, a TFT LCD or OLED displayscreen, touch or other gesture sensors, a camera or other opticalsensor, a microphone, etc., together with software instructions forproviding a graphical user interface), a battery, and an interface forcommunicating with other devices (which may be wireless, such as GSM,CDMA, W-CDMA, CDMA2000, TDMA, EV-DO, HSDPA, WiFi, WiMax, or Bluetooth,and/or wired, such as through an Ethernet local area network, a T-1internet connection, etc.). An exemplary cell phone that can be used topractice part or all of the detailed arrangements is shown in FIG. 3.

The processor can be a special purpose hardware device, or may beimplemented by a programmable device executing software instructionsread from a memory or storage, or by combinations thereof. (The ARMseries of CPUs, using a 32-bit RISC architecture developed by Arm,Limited, is used in many cell phones.) References to “processor” shouldthus be understood to refer to functionality, rather than any particularform of implementation.

In addition to implementation by dedicated hardware, orsoftware-controlled programmable hardware, the processor can alsocomprise a field programmable gate array, such as the Xilinx Virtexseries device. Alternatively the processor may include one or moredigital signal processing cores, such as Texas Instruments TMS320 seriesdevices.

Software instructions for implementing the detailed functionality can bereadily authored by artisans, from the descriptions provided herein.

Typically, devices for practicing the detailed methods include operatingsystem software that provides interfaces to hardware devices and generalpurpose functions, and also include application software that can beselectively invoked to perform particular tasks desired by a user. Knownbrowser software, communications software, and media processing softwarecan be adapted for uses detailed herein. Some embodiments may beimplemented as embedded systems—a special purpose computer system inwhich the operating system software and the application software isindistinguishable to the user (e.g., as is commonly the case in basiccell phones). The functionality detailed in this specification can beimplemented in operating system software, application software and/or asembedded system software.

Different of the functionality can be implemented on different devices.For example, in a system in which a cell phone communicates with aserver at a remote service provider, different tasks can be performedexclusively by one device or the other, or execution can be distributedbetween the devices. For example, extraction of signatures from a testimage on a cell phone, and searching of a database for correspondingreference images on a remote server, is one architecture, but there aremany others. For example, information about reference images may bestored on the cell phone—allowing the cell phone to capture a testimage, generate signatures, and compare against stored signature datastructures for reference images—all without reliance on externalsdevices. Thus, it should be understood that description of an operationas being performed by a particular device (e.g., a cell phone) is notlimiting but exemplary; performance of the operation by another device(e.g., a remote server), or shared between devices, is also expresslycontemplated. (Moreover, more than two devices may commonly be employed.E.g., a service provider may refer some tasks, functions or operations,to servers dedicated to such tasks.) In like fashion, data can be storedanywhere: local device, remote device, in the cloud, distributed, etc.

Operations need not be performed exclusively byspecifically-identifiable hardware. Rather, some operations can bereferred out to other services (e.g., cloud computing), which attend totheir execution by still further, generally anonymous, systems. Suchdistributed systems can be large scale (e.g., involving computingresources around the globe), or local (e.g., as when a portable deviceidentifies nearby devices through Bluetooth communication, and involvesone or more of the nearby devices in an operation.)

Concluding Remarks

Having described and illustrated the principles of the technology withreference to specific implementations, it will be recognized that thetechnology can be implemented in many other, different, forms. Toprovide a comprehensive disclosure without unduly lengthening thespecification, applicants incorporate by reference the patents andpatent applications referenced above, in their entireties.

The methods, processes, and systems described above may be implementedin hardware, software or a combination of hardware and software. Forexample, the signal processing operations described above may beimplemented as instructions stored in a memory and executed in aprogrammable computer (including both software and firmwareinstructions), implemented as digital logic circuitry in a specialpurpose digital circuit, or combination of instructions executed in oneor more processors and digital logic circuit modules. The methods andprocesses described above may be implemented in programs executed from asystem's memory (a computer readable medium, such as an electronic,optical or magnetic storage device). The methods, instructions andcircuitry operate on electronic signals, or signals in otherelectromagnetic forms. These signals further represent physical signalslike image signals (e.g., light waves in the visible spectrum) capturedin image sensors. These electromagnetic signal representations aretransformed to different states as detailed above to detect signalattributes, perform pattern recognition and matching, encode and decodedigital data signals, calculate relative attributes of source signalsfrom different sources, etc.

The above methods, instructions, and hardware operate on reference andsuspect signal components. As signals can be represented as a sum ofsignal components formed by projecting the signal onto basis functions,the above methods generally apply to a variety of signal types. TheFourier transform, for example, represents a signal as a sum of thesignal's projections onto a set of basis functions.

The particular combinations of elements and features in theabove-detailed embodiments are exemplary only; the interchanging andsubstitution of these teachings with other teachings in this and theincorporated-by-reference patents/applications are also contemplated.

What is claimed is:
 1. A method of controlling a device, the devicecomprising a camera and a display screen, said method comprising:receiving image data captured by the camera; modifying received imagedata to compensate for distortion caused by capture positioning of thecamera relative to an imaged subject; analyzing modified imagery todetect an encoded signal therefrom; extracting a digital fingerprintfrom the modified imagery, the digital fingerprint corresponding to animage area hosting the encoded signal; determining a relative spatialposition of the image area based on the digital fingerprint; andproviding the relative spatial position of the image area to anaugmented reality (AR) system, in which the AR system overlays graphicsor video on the display screen corresponding to the image area.
 2. Themethod of claim 1 in which said extracting utilizes a linear projection.3. The method of claim 1 in which said determining compares the digitalfingerprint to a plurality of digital fingerprints, with each of theplurality of digital fingerprint corresponding to a different spatialimage area of the received image data.
 4. The method of claim 3 in whichthe AR system overlays graphics or video on the display screencorresponding to the different spatial image areas.
 5. The method ofclaim 3 in which the plurality of digital fingerprints are stored in aremote server.
 6. The method of claim 1 in which the encoded signalcomprises orientation attributes, said method further comprisingproviding the orientation information to the AR system, the orientationinformation obtained from analysis of the orientation attributes, inwhich the AR system utilizes the orientation attributes to overlaygraphics or video on the display screen.
 7. The method of claim 6 inwhich the encoded signal comprises digital watermarking.
 8. The methodof claim 1 in which the encoded signal comprises digital watermarking.9. A method comprising: receiving imagery captured by a cameraincorporated into a device, the device comprising a display screen;analyzing the imagery to decode an encoded signal therefrom, in whichthe encoded signal comprises orientation attributes; determiningorientation information from the orientation attributes, the orientationinformation being associated with a capture position of the devicesensor relative to an imaged subject; utilizing the orientationinformation as pose input for use by an augmented reality (AR) system,the pose input updating or resetting pose information being used by theAR system; based on spatial information provided by the AR system,displaying graphics or video over the imagery on the display screen. 10.The method of claim 9 in which the pose information utilizes key pointswithin the imagery.
 11. The method of claim 9 in which the pose inputcomprises an input to a homography generator.
 12. The method of claim 9further comprising: modifying the imagery based on the orientationinformation, said modifying yielding modified imagery; and extracting adigital fingerprint from a spatial area hosting the encoded signalwithin the modified imagery.
 13. The method of claim 9 in which theencoded signal comprising digital watermarking.
 14. An apparatuscomprising: a camera for capturing image data; a display screen; and oneor more processors configured for: transforming captured image data tocompensate for distortion caused by capture positioning of said camerarelative to an imaged subject; detecting an encoded signal from thetransformed imagery; generating a digital fingerprint from thetransformed imagery, the digital fingerprint corresponding to an imagearea hosting the encoded signal; determining a relative spatial positionof the image area based on the digital fingerprint; and providing therelative spatial position of the image area to an augmented reality (AR)system, in which the AR system overlays graphics or video on saiddisplay screen corresponding to the image area.
 15. The apparatus ofclaim 14 in which the generating utilizes a linear projection.
 16. Theapparatus of claim 15 in which said determining compares the digitalfingerprint to a plurality of digital fingerprints, with each of theplurality of digital fingerprint corresponding to a different spatialimage area of the captured image data.
 17. The apparatus of claim 16 inwhich the AR system overlays graphics or video on the display screencorresponding to the different spatial image areas.
 18. The apparatus ofclaim 16 in which the plurality of digital fingerprints are stored in aremote server.
 19. The apparatus of claim 14 in which the encoded signalcomprises orientation attributes, and in which said one or moreelectronic processor are configured for providing the orientationinformation to the AR system, the orientation information obtained fromanalysis of the orientation attributes, in which the AR system utilizesthe orientation attributes to overlay graphics or video on the displayscreen.
 20. The apparatus of claim 19 in which the encoded signalcomprises digital watermarking.